-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[For merging on 30.09] landing page is canonical #513
Conversation
The latest push removes the links to nested pages in favor of the canonical variants:
etc. I did not switch the canonical page yet. Testing shows that it works so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just submitted comments that should make the changes here live. Can someone from @acl-org/anthology take a look and approve?
{{- end -}} | ||
{{- $bibfile := printf "/papers/%s/%s/%s.bib" (slicestr $volume_id 0 1) $volume_id .Params.anthology_id -}} | ||
{{- if (fileExists (printf "/data-export/%s" $bibfile)) -}} | ||
<a class="badge badge-secondary align-middle mr-1" href="{{ $bibfile | relURL }}" data-toggle="tooltip" data-placement="top" title="Export to BibTeX"> | ||
<a class="badge badge-secondary align-middle mr-1" href="{{ $paper.url }}.bib" data-toggle="tooltip" data-placement="top" title="Export to BibTeX"> | ||
bib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bib path is now ${ANTH_ID}.bib
, e.g., https://aclweb.org/anthology/P19-1002.bib
, instead of the nested version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're no longer using $bibfile
, but the link is still within an if clause that checks for $bibfile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine though, right? If the bibfile exists, we generate the appropriate link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change this because I'm not sure how to fix it. If the bibfile is present, we generate the canonical link to it, which is handled by a redirect internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the question is whether we still want a check for the existence of the file in the Hugo template.
If we do, then this current solution is suboptimal IMO since it requires us manually remembering and updating the actual local path to the file (in the $bibfile := ...
line), in case it ever changes.
If we do not, we should just throw out the surrounding lines altogether.
hugo/layouts/papers/list-entry.html
Outdated
@@ -28,7 +25,7 @@ | |||
{{ if eq hugo.Environment "development" }} | |||
<span class="badge badge-light align-middle">{{ .Params.anthology_id }}</span> | |||
{{ end }} | |||
<strong><a class="align-middle" href="{{ .RelPermalink }}">{{ $paper.title_html | safeHTML }}</a></strong> | |||
<strong><a class="align-middle" href="{{ $paper.url }}/">{{ $paper.title_html | safeHTML }}</a></strong> | |||
<br /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the canonical page (https://aclweb.org/anthology/P19-1002/
) instead of the nested version. This will get rid of the double search results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern as above: URL can refer to external content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. url
now points to the page's Anthology URL; pdf
points to the PDF (which can be internal or external).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of how this is now handled is that it completely breaks local browsing of the site, since links always point to the live anthology.
I see two options here:
-
Making the behaviour dependent on Hugo's "environment" variable, using
.RelPermalink
when in development mode and$paper.url
otherwise. I don't really like this, as diverging behaviour has the potential to hide bugs. -
Generating the flat structure directly with Hugo, as discussed in Should the canonical URL be the landing page? #480, instead of relying on redirects.
@@ -44,7 +44,7 @@ | |||
{{ $bibfile := printf "/papers/%s/%s/%s.bib" (slicestr $volume_id 0 1) $volume_id $anthology_id }} | |||
<section id="main"> | |||
<h2 id="title"> | |||
<a href="{{ $paper.url }}">{{ $paper.title_html | safeHTML }}</a> | |||
<a href="{{ $paper.url }}/">{{ $paper.title_html | safeHTML }}</a> | |||
</h2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paper page has a /
at the end.
{{ end }} | ||
{{ $endfile := printf "/papers/%s/%s/%s.endf" (slicestr $volume_id 0 1) $volume_id $anthology_id }} | ||
{{ if (fileExists (printf "/data-export/%s" $endfile)) }} | ||
<a class="btn btn-secondary btn-sm" href="{{ $endfile | relURL }}">EndNote</a> | ||
<a class="btn btn-secondary btn-sm" href="{{ $paper.url}}.endf">EndNote</a> | ||
{{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the canonical URL prefix instead of the deeply nested prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern as with $bibfile
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you suggest that we generate bib
, endf
, etc links in the YAML, and only use those if present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I ended up doing for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, but has the same issue with local browsing–the links will always point to the live server.
hugo/static/.htaccess
Outdated
# The Paper metadata page (e.g., P17-1069/ -> papers/P/P17/P17-1069/index.html) | ||
RewriteRule ^([A-Za-z])([0-9][0-9])\-([0-9][0-9][0-9][0-9])\/$ papers/$1/$1$2/$1$2-$3/ [L,NC] | ||
# The Paper metadata page (e.g., P17-1069/ loads papers/P/P17/P17-1069/index.html) | ||
RewriteRule ^([A-Za-z])([0-9][0-9])\-([0-9][0-9][0-9][0-9])\/$ papers/$1/$1$2/$1$2-$3/index.html [L,NC] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loads the index file when https://aclweb.org/anthology/P19-1002/
(with a slash) is requested.
Besides my individual comments, I really don't like this mixing of URL logic. URLs should either be constructed in the YAML or in the template, but right now it's a mixture of both, which seems very error-prone to me. (And in fact I believe this will break things, since we have external links as well.) |
Thanks @mbollmann, it sounds like this needs more work. What do you think about just pushing up #520 in the meantime? (I've actually had this live for some time). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's make sure this functionality won't break external links, and ideally refactor it to the YAML if feasible.
One way to do this would be for the YAML to have the following fields:
We then just print their values. Thoughts? I think we can use this PR as the official PR for #480. |
@mbollmann, do you have time for a review? You are more qualified, but otherwise I will have a look at it next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor functional comments, but my main issue with this is how it breaks local browsing of the site, as detailed in the comments.
{{- end -}} | ||
{{- $bibfile := printf "/papers/%s/%s/%s.bib" (slicestr $volume_id 0 1) $volume_id .Params.anthology_id -}} | ||
{{- if (fileExists (printf "/data-export/%s" $bibfile)) -}} | ||
<a class="badge badge-secondary align-middle mr-1" href="{{ $bibfile | relURL }}" data-toggle="tooltip" data-placement="top" title="Export to BibTeX"> | ||
<a class="badge badge-secondary align-middle mr-1" href="{{ $paper.url }}.bib" data-toggle="tooltip" data-placement="top" title="Export to BibTeX"> | ||
bib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the question is whether we still want a check for the existence of the file in the Hugo template.
If we do, then this current solution is suboptimal IMO since it requires us manually remembering and updating the actual local path to the file (in the $bibfile := ...
line), in case it ever changes.
If we do not, we should just throw out the surrounding lines altogether.
hugo/layouts/papers/list-entry.html
Outdated
@@ -28,7 +25,7 @@ | |||
{{ if eq hugo.Environment "development" }} | |||
<span class="badge badge-light align-middle">{{ .Params.anthology_id }}</span> | |||
{{ end }} | |||
<strong><a class="align-middle" href="{{ .RelPermalink }}">{{ $paper.title_html | safeHTML }}</a></strong> | |||
<strong><a class="align-middle" href="{{ $paper.url }}/">{{ $paper.title_html | safeHTML }}</a></strong> | |||
<br /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of how this is now handled is that it completely breaks local browsing of the site, since links always point to the live anthology.
I see two options here:
-
Making the behaviour dependent on Hugo's "environment" variable, using
.RelPermalink
when in development mode and$paper.url
otherwise. I don't really like this, as diverging behaviour has the potential to hide bugs. -
Generating the flat structure directly with Hugo, as discussed in Should the canonical URL be the landing page? #480, instead of relying on redirects.
{{ end }} | ||
{{ $endfile := printf "/papers/%s/%s/%s.endf" (slicestr $volume_id 0 1) $volume_id $anthology_id }} | ||
{{ if (fileExists (printf "/data-export/%s" $endfile)) }} | ||
<a class="btn btn-secondary btn-sm" href="{{ $endfile | relURL }}">EndNote</a> | ||
<a class="btn btn-secondary btn-sm" href="{{ $paper.url}}.endf">EndNote</a> | ||
{{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, but has the same issue with local browsing–the links will always point to the live server.
I'm not sure I like either (1) or (2) above:
What about
|
Yes, we then have a lot of files in one directory. But this does not slow down anything, it is "only" a personal preference to have files sorted into directories (don't mean it insulting, I share the general preference). In fact, we lose complexity and the website might actually become faster with a flat structure. This reduced complexity might also make it easier to reason about bugs as the html files directly show the problems and no mental step through redirects is necessary. To not have all files in a single directory, we could instead also have a sub-directory for HTML, one for bibtex, etc. I just fear that we add more complexity (docker containers, switches on the environment) where we could reduce it. Sure, a complex system can look beautiful, but a simple system is simple. |
FWIW, I believe that changing the output locations of the paper pages should be as simple as adding [permalinks]
papers = "/:filename/" to |
One worry is that our hosting provider might not have the same modern file system. What if we kept the current nested links for all generated files (bib, xml, endnote), but also added the top-level RewriteRule? That way they will both work. It's really only the |
Okay, but this wouldn't work for the landing page. What if we:
This adds a bit more structure to the top-level and has direct links. It does mean that the generated files are accessible from two paths, but I don't think that's a problem for those kinds of files. Second, does anyone have time to do this before October 1? I will own it since I am the one who announced the date, but maybe it's quicker for someone else. |
Ext4 has been a stable file system since 2008, that is more than ten years ago. I am sure they don't host this on FAT32 or similar ... To test whether speed is an issue, you can run this in the With your second approach, we have more structure but would probably have a similar slow-down if the server cannot handle thousands of entries in a single directory. |
Unfortunately not me, I am lying in my bed with a fever :-( Do whatever you want to achieve this and ignore all my comments if it makes your life easier. |
Sorry to hear it, @akoehn! I hope you feel better soon. I ran some tests on the server, and there is no problem with all the files in one directory. So let's go that route. The only choice left is whether to put generated files in subdirectories or not (e.g., bib files under
|
Feel better! Hope you get well soon!
…On Sat, Sep 28, 2019 at 12:23 AM Arne Köhn ***@***.***> wrote:
Second, does anyone have time to do this before October 1?
Unfortunately not me, I am lying in my bed with a fever :-(
Do whatever you want to achieve this and ignore all my comments if it
makes your life easier.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#513?email_source=notifications&email_token=AABU72ZNCKZP7ETL6UTZ7TLQLYXRFA5CNFSM4IMQ24I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7ZNIBI#issuecomment-536007685>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABU72YLDPYFIVP5P5KVEKTQLYXRFANCNFSM4IMQ24IQ>
.
|
Last update: I've looked into this a bit, and I don't know hugo well enough to be able to implement the single directory by Monday. I'm going to suggest that we use the current proposal (which breaks local builds) until someone has a minute to fix it according our discussion here. |
I've noticed a couple of instances where links don't point to the right location yet; for example, the metadata field I can try to fix those, but debugging the website without a working local build is substantially harder... |
How hard is it to fix the local builds? I’m happy to have everything in one directory but I don’t know hugo well enough. |
For the pages it should be as simple as the config file change I've wrote about above, but I'd want to test it and make sure that all the links point to the correct places. The bibliography files shouldn't be too hard either, but the same caveat applies. I might have some time later tonight, but no promises. What time exactly would you want to push this live? |
I don't think the time is important. It would be good if we could make it available sometime on the 1st, though. |
I've added two commits that should do what we discussed, and appear to produce the right thing locally. We could put bibliography files in a subdirectory as well (in that case I'd go with the |
Great, I'll test this out. |
It always used to link to the PDF though, so this would be a functional change. |
Also consider pages with external links, like LREC publications, which also point to the PDF in the "URL" field: |
Okay, you've convinced me. I'll merge this tomorrow night! |
* Removed nested links in favor of canonical variants * Flatten hierarchy of paper pages, change ANTHOLOGY_URL -> _PDF * Fix URLs that should point to PDFs * Fix volume-level URLs that should point to PDFs * Generate bibliography files in flat structure
* removed redirect rules (should have been in acl-org#513) * added nodalida to events page (closes acl-org#544)
This fix redirects paper pages like
to the canonical slashed variant
I hope this will stop people from accidentally sharing the nested URL forms which I find ugly and redundant. Note that this is just a small change; it's related to #480 but doesn't yet change the canonical URL. (Also note that I have tested this out and it is actually already in place).